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1 . Introduction 

Distances  between  probability  measures  have  been  used  as  valuable  evalua- 
tion and  optimization  criteria  in  signal  selection  [8],  pattern  recognition  [22], 
universal  coding  [16,  18],  and  statistical  robustness  [21]. 

The  choice  of  the  proper  distance  must  be  the  result  of  a carefully  weighted 
decision,  where  both  the  calculability  and  the  power  of  the  distance  are  considered. 

An  additional  factor  that  is  important  for  the  choice  of  the  proper  distance 
in  statistically  ill-defined  environments  is  the  sensitivity  of  the  distance  to  the 
changes  of  the  underlined  measures. 

In  section  3 of  this  paper  some  comparative  evaluation  between  Bhattacharyya, 
I-divergence , Vasershtein,  variational  and  Levy  distances  is  presented. 

In  section  4,  the  above  distances  are  used  for  the  linear  reduction  of  n 
data  coming  from  two  Gaussian  populations,  to  one  feature. 

2.  Preliminaries 

Let  (q,  G,  (j)  be  a separable,  complete  measure  space,  where  G is  a 
a-algebra  of  sets  in  Q and  jj  is  the  measure.  Let  us  now  define  two  different 
measures  ij^,  ^2  G)  such  that  • 

If  A,B  are  set-  members  of  the  p-algebra  G,  and  if  p (A,B)  is  a penalty 
function,  or  distortion  measure  defined  on  G,  then  the  Prokhorov  and  Vasershtein 
distances  between  the  two  measures  sre  defined  as  follows: 


(1) 
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Vasershtein  distance: 


®r(. , 
all  joint 

measures  r(A,B)  :|jj^(A)  marginals 

The  distortion  measure  considered  is  incorporated  in  the  definition  of  the 
Prokhorov  and  Vasershtein  distances.  Also,  the  Prokhorov  distance  is  very  sen- 
sitive to  the  changes  of  the  |j  measures,  while  for  some  choices  of  distortion 
measure  p(.,.),  the  Vasershtein  distance  is  only  a function  of  second  order  sta- 
tistical characteristics  (as  shown  in  [16]  and  discussed  in  section  3).  Finally, 
while  the  Prokhorov  distance  is  bounded  from  above  by  the  Vasershtein 

distance  is  bounded  only  if  the  distortion  measure  p(A,B)  is,  for  all  A,B^. 

If  the  distortion  measure  p(A,B)  is  a metric  and  is  such  that  p(A,A)  = 0, 

then  it  was  shown  by  Dobrushin  ([13]  pp  496)  that  d^  p^l  also  a metric 

and  d,,  (u,  u.,)  = 0 if  and  only  if  The  Prokhorov  distance  has  the  same 

V,p^i,'^2  1^  2 

properties  for  any  p(.,.)  choice.  In  Dobrushin  the  relationship: 


is  also  stated.  A formal  proof  of  this  relationship  is  given  in  appendix  A. 


If  the  space  f}  is  a Euclidean  space,  and  f^  f are  cumulative  distributions, 
the  Prokhorov  distance  reduces  to  the  L^vy  distance.  For  Q the  x-real  line,  and 
for  |j^, 1^2  being  the  one-dimensional  cumulative  distributions  Fj^(x),F2(x)  corres- 
pondingly, the  Levy  distance  is  given  by  the  following  expression: 
dj^  p(F^  F2>  inf  jc  :F2  (min(y:p(x,y)s  g)) -€sFj(x)<:F2(max(y:p(x,y))se)  +e;Vxj- 

The  L^vy  distance  has  the  same  properties  with  the  Prokhorov  distance.  Specifi- 
cally, it  is  a metric,  it  lies  between  0 and  1 and  it  is  equal  to  zero  if  and  only 
if  F^i»F2*  It  is  also  obviously  true  that  the  Levy  distance  is  very  sensitive  to 

distribution  Fj  F2  changes. 
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A set  of  distances  that  are  also  very  sensitive  to  distribution  changes 
and  are  defined  on  Euclidean  spaces  are  the  Ralmogorov,  variational,  I-divergence 
and  Bhattacharyya.  The  three  last  assume  existence  of  non-cumulative  distributions 
(density  functions).  If  we  concentrate  on  one-dimensional  Euclidean  spaces  and 
denote  by  f^^  f^  the  densities  of  ^2  ‘^o^^respondingly,  we  denote: 


Kolmogorov  distance: 

‘^K^^l’^2^  = sup  I F^(x)-F2(x)  I 


Variational  distance: 


dy^(fj,  f2>  = 1 fj^(x)-f2(x)|  dx 


(5) 


(6) 


I-divergence  distance: 


f,  (x) 


1 P r ~\ 

diCfj.f^)  = 2 1 [fi(x)-f2(x)]log  dx 


(7) 


Bhattacharyya  distance: 


00  « a 

‘^B^^l’ ^2^  = lnj”f  f®(x)  f2®(x)dxj 


-1 


(8) 


The  Bhattacharyya  distance  is  well  known  to  be  an  upper  bound  to  the  pro- 
bability of  erratic  decisir'n  between  fj^  and  f2*  Also,  the  variational  distance  in 
addition  to  being  over  sensitive  to  density  changes,  it  is  also  very  demanding.  A 
more  detailed  comparative  evaluation  of  the  six  distances  covered  in  this  preliminary 
discussion  is  presented  in  the  following  section. 

3.  Comparative  evaluation 

The  relationship  between  Prokhorov  and  Vasershtein  distances  has  been  al- 
ready given  by  (3) . The  following  lemma  describes  a simple  relationship  between 
the  Levy,  Kolmogorov,  and  variational  distances,  where  multidimensional  distributions 
are  in  general  considered. 
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Lemma  1 


For  Q Q multidimensional  cumulative  distributions  and  f their  corresponding 


densities,  the  following  ranking  is  true: 


dJQj.Q^)  d d dvr«i;f2) 


Proof: 


1.  For  the  relationship  between  Levy  and  Kolmogorov  distances,  let 


Then, 


Qi(X)-Q2(X)  I . 

Qj(X)-Q2(X)fi|Qj^(X)-Q2(X)|s6s 
SQ2  (UY : p (X , Y)sL6  ) -Q2  (X)  46 , VX 

Q2<X)  -Qj^  (X)s|  Q2  (X)  -Q^  (X)  I 

sQj(UY:p(X,Y)s:5)-Qj(X)46;VX 


r Q^(X)sQ2(UY:p(X, 
Lq2(X)sQj,(uY:p(X, 


Y)s&)+6;VX 


Y)s6)46,VX 


Therefore,  6 is  a candidate  for  dj^(Q]^>Q2^  ,Q2)s6=dj^(Qj^,Q2> < 

2.  ’ For -the  relationship  between  variational  and  Kolmogorov  distances:.. 


|Q^(X)-Q2(X)|=|  rTf^(Y)-f2(Y)]dY|fi 
sJ^|fj(Y)-f2(Y)|dY<:J”lf^(Y)-f2(Y)|dY 


=*<Ik(Qi  sup|  (X)  -Q2<X)  ( sdy^  (f  1 . f 2> 

X 

It  can  be  seen  from  the  above  lemma  that  the  Levy  distance  is  the  weakest,  the 


variational  is  the  strongest  and  the  Kolmogorov  lies  in  between. 
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The  Bhattacharyya  and  I-divergence  distances  consist  a separate  group 
that  is  not  directly  comparable  to  the  previously  mentioned  distances.  In  addi- 
tion, as  explained  by  Kallath  [83,  the  Bhattacharyya  and  I-divergence  distances 
are  not  directly  comparable  either. 

At  this  point  it  is  interesting  to  make  the  peripheral  comment  that  the 
nonsymmetric  I-divergence  distance  given  by  the  expression 


r-  ^1^^^ 

di(f,,V=Jfl(Y)log^dY  - ( 

is  also  the  discrimination  or  relative  entropy  of  with  respect  to  defined 
by  Wyner  [19]  for  a two-teceiver  broadcast  channel.  The  symmetric  I-divergence 
distance  in  (7)  can  then  be  looked  at  as  the  mutual  relative  entropy  between  the 
two  receivers  of  the  same  channel.  Continuing  on  the  comparative  evaluations  of 
the  distances,  we  will  present  two  lemmas  involving  the  Prokhorov,  generalized 
Kolmogorov  and  Vasershtein  distances.  The  conclusion  from  these  two  lemmas  is 
that  for  particular  choices  of  penalty  or  distortion  measures  p(.,.)  both  the 
Prokhorov  and  Vasershtein  distances  degenerate  to  the  Kolmogorov  one. 


Lemma  2 

Let  (0,G,|jj^)  and  (n,G,|a2)  two  measure  spaces  with  measures  such  that: 


where  S some  real  number. 


Let  also 


S,A  s B 


p(A,B)  { 

^O'A  = B 

dp.o(^l•U2>=‘^K^l’^^2^ 


I 
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Lemma  3 

For  measure  spaces,  distortion  measure  p(.,.)  as  in  lemma  2,  and  S=l,  it  is  also 


true  that 


The  result  expressed  by  lemma  3 was  first  found  by  Dobrushin  [13].  An  alternative 
proof  for  this  lemma  and  proof  of  lemma  2 are  presented  in  appendix  A. 

Another  property  of  the  distances  that  is  valuable  for  any  optimization 
problem  involving  them  is  convexity.  It  is  easy  to  show  that  the  variational, 

Kolmogorov  and  I-divergence  distances  are  convex  U on  closed  linear  manifolds  of 

any  of  the  two  distributions  involved. 

The  Vasershtein  distance  for  distributions  and  the  Levy  distance  are  con- 
sidered interesting  cases  here  and  are  examined  in  detail. 


Let  us  first  consider  the  Vasershtein  distance  for  distributions,  and 

denote  Q, (X  ),Q„(Y  } two  n-dimensional  distributions  at  X„  and  Y_  respectively 
JL  -O  / kZi  iji 

In  addition,- denote  by  R , (X -,-Y  ) any  2n-dimensional  distribution  with  Q,  »Q~ 

X2  '-O  12 

marginals.  For  such  distributions  the  Vasershtein  distance  is  given  by  the  followr_ 


ing  expression: 


. . . 


The  f 


q^l  lowing 


lemma  can  then  be  stated. 


Lemma  4 


For  distortion  measure  p(.,.)>0,  the  Vasershtein  distance  (11)  is 

convex  U on  any  closed  linear  manifold  of  either  Q, , or  Q, 
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Proof 


Let  be  the  space  of  all  joint  distributions  R(.,.)  that  have  marginals. 

Consider  the  Rj.(.,.)  € 6j^2‘  ^ **13'  every 

h€  [0,1],  the  distribution 


R(.,.)=hR^(.,.)+(l-h)R2(.,.) 


has  marginals  Qj^  and  hQ^+Cl-hjQ^.  Therefore,  for  every  R^(. , . )6Bj^2»  *^€[0,1], 


R2(  • » • )€Bj^2»  distribution 


R(.,.)  h(Rj(.,.)+(l-h)R2(.,.)  (i: 

is  a member  of  the  space  B,..,-  that  has  Q,  »hQ.,+(l-h)Q  marginals.  In  other  words, 

in23  L Z ^ 

the  distributions  expressed  >'  (12)  consist  a subspace  of  the  space  Bj^j^23‘ 


Therefore, 


dv.p(Qi.hQ2+a-h)Q3)-int 

R(  . , 


R(.,.)  in  the  linear  Rj^(- j -)€Bj^2 

family  described  by  (12)  R.,(.,.)GB 


R^(.,.)€Bj2 
2 ^ ^ ]_  2 


But  since  p(.,.)s0,  we  can  write  from  above: 

dv,/Ql,hQj+(l-h)Q3)-h  £nf  _ 3tp(X„.X„)]+ 

^1  ( • » • )6fi-i  fy 

+(l-h)  inf  ER^(^,.)fp(X„,Y^)3 

®2  ^ ]^3 

=hdy,p(Qj,Q2)+(l-h)dy,p(Qj,Q3) 

and  the  proof  is  here  complete,  since  symmetric  analysis  leads  to  similar  result 
for  the  other  distributions  involved  in  the  Vasershtein  distance. 
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We  should  make  here  the  additional  observation  that  the  result  of  the 
above  lemma  can  be  extended  to  arbitrary  measures  p(.,.)aO. 

Also,  we  will  emphasize  here  that  if  the  distortion  measure  (X  .,Y.  ) is 

n n 

symmetric  with  respect  to  and  Y^,  the  Vasershtein  distance  is  also  symmetric 
with  respect  to  the  distributions  Qj^>Q2‘  That  is,  then. 

At  the  study  of  the  Levy  distance  for  convexity,  it  became  evident  that 
such  property  is  secured  only  if  the  distance  is  redefined  on  a closed  inter^nal 
of  the  distribution  domain.  The  reason  that  such  redefinition  is  necessary  is 
that  convexity  of  the  underline  distributions  is  required  then,  and  such  convexity 
is  not  true  on  the  whole  (-as,®)  for  nontrivial  such  distributions.  That  will  be 
clear  in  the  following  detailed  discussion. 

To  make  the  discussion  as  meaningful  as  possible  we  will  restrict  ourselves 
to  one-dimensional  distributions  F.  The  arguments  and  the  results  can  be  easily 
extended  to  multidimensional  spaces. 

Let  us  consider  two  constants  a,b  such  that  a<b  and  define  the  Levy  distance 
in  the  following  way: 


■r 


h 


dL»p(Fi,F2)=inf  |e  :F2(min(y:p(x,y)sc)  -€SF^(x)sF2(max(y:p(x,y)5e)-N:  ;Vx€[a,b]j-  (L3) 

WhAt  is  implied  in  the  definition  (13)  is  that  the  domain  of  interest  is 
[a,b],  which  means  that  the  values  of  in  the  remaining  domain  are  either  kept 

fixed  or  they  have  no  influence  on  the  system  under  consideration. 

Now,  we  can  present  the  following  lemma: 

Lemma  5 

The  Levy  distance  expressed  by  (13)  is  convex  U on  the  closed  linear  mani- 
fold of  distributions  convex  f)  on  [a-l,b+l]. 


\:i 

I 
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Proof:  _ 


We  will  prove  the  lenuna  for  F2  only.  Due  to  symmetry  the  proof  for  Fj^ 


will  be  the  same. 


Let  ^]^>^21’^22  three  distributions  that  are  n convex  on  [a,b],  and  form 
the  distribution  ^2” 0sh<l.  The  distribution  Fj^23  also 
convex  (1  on  [a,b].  We  can  write  as  discussed  in  [21]: 


dj^,p(Fi,F2i)=inf-^e:F2i(x)-j^Fi(max(y:p(x,y)se))-»t:JfiO; 

F2i(x)-  j^Fj^(min(y:p(x,y)se))-ej2:0;Vx^0;bJj. 


‘^L’p^l’^22^  =inf{e  :F22(x)-[^Fi(max(y:p  (x,y)s€ ) )-ft  JsO; 

^22  ”^^1  ^ ^ -e  ; Vx^ a,bjj-  =t^  (14) 


^L’p^^l’^2^^"^  |,€:.h|^F2i(x)-CFi(max(y:p(x,y)se))-*i:-]J+(l-h)|^F22(x)- 
+(l-h)[*F22(x)-j"  (min(y:p(x,y)s€)-e]j2:0;Vxej^a,b^ 


Since  d , (F,F  ,)=£,,  for  every  6>0  there  is  some  e ‘e,se,  <E,+6  that  satisfies: 
L p 1 21  1 0 2^  1 0 2^  1 


1 


G..:e„se,  <e^+t  that  satisfies; 
0 2 ^ ^2  ^ ' 


^F.„(x)-rF- (max(y:p(x,y)s€.  1^0 

22  i I dry  OryJ 


‘2  '"2- 


Vx6[a,b] 


(17) 


F22(x)-rFi(niin(y:p(x,y)sEg  ))-Gg 


^ " “2  '^2- 
From  (16)  and  (17)  it  is  implied  that: 


F2i(x)-j^Fi(max(y:p(x,y)^^^))4t:g  J 


+(l-h)|j^F22(x)-|^F^(max(y;p  (x,y)^^  ))+e^  J 


<0 


Vx[a,b]  (18) 


F2i(x)-j^Fi(min(y;p(x,y)se.g  ))-€g  ] 


+(l-h) 


L 


aO 


or  that: 


hF2i(x)+(l-h)F22(x)fihFi(max(y:p(x,y)seg  ) )+(l -h)Fj^ (max(y :p  (x,y)se^  ^^"''[^6  +(l-h)e^  ] 


Vx€[a,b]  (19) 


hF2i(x)+(l-h)F22(x)ahFi(min(y:p  (x,y)5£^  ))+(l-h)Fi(min(y:p(x,y)se^  ] 


Due  to  the  0 convexity  of  F^  on  [a-l,b+l]  we  have: 

F, (hmax(y:p (x,y)^ej  )+(l-h)max(y:p  (x,y)se  ))»F  (x+hp  ^(e.  )+(l-h)p  "(c-  )) 
1 °2  °1  °2 


-1. 


sliF,  (x+p‘^(e.  ))+(l-h)F  (xdp  "(e  )) 

1 0 1 ^ 0 2 


-1. 


(20) 


-1 


where  p (e  >=  ■nax(y:p  (x,y)^e  )-x 


Substituting  (20)  in  the  first  part  of  (19)  we  obtain: 

"1/  \.yi  i \ **lj 


''  hP2j (x)+(l-h)F22(x)sFj (x+hp  (c^  )+(l-h)p  (c^  ) )+hCg ^+(l-h)Gg 


Vxe[a,b] 


(21) 


Maifiii 


The  second  part  of  (19)  can  be  treated  the  same  way  as  dicussed  in  [ 21] . 
p ^(c)se  it  is  directly  then  derived  from  (21)  that 

<6+hcj^+(l-h)e2;V6>0,p  (x,y)  = jx-y| 


Therefore,  ^^1  ’ ^'^l ^'^l ’^22^ 

proof  is  complete 


We  can  observe  here  that  a distribution  F(x)  that  is  convex  fi  on  [a-l,b+l]  is 
a distribution  with  negative  second  derivative  in  [a-l,b+l]  which  corresponds  to 
density  monotomically  decreasing  on  the  same  interval. 

As  a conclusion  from  lemma  5 we  can  observe  that  due  to  the  convexity  expressed 
there,  given  any  distribution  F there  is  a unique  best  approximation  of  it  in  the 
Levy  distance  sense  given  by  (12),  where  the  approximation  is  taken  from  the  linear 
manifold  of  distributions  that  are  convex  0 inside  a certain  closed  interval 
[a-l,b+l] . 

As  we  see  in  the  following  section,  the  convexity  of  the  distances  besides 
been  valuable  for  best  approximation  problems,  it  is  also  useful  in  the  search  for 
optimal  linear  data  transformations  under  restrictions. 

4 . The  distances  and  data  linear  transformations 

Let  (n,G)  be  a space  with  a a-algebra  defined  on  it,  and  let  T be  a 
transformation  on  the  iju  elements  of  the  space  Q.  The  transformed  space  will  be 
called  TQ,  and  the  transformed  0-algebra  will  be  denoted  TG. 


Corollary  1 


TG  is  a o-algebra  if  Q is  a countable  space. 
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Proof 


Let  A.BGCi  and  denote  by  TA,  TB  the  transformed  sets.  Then  XAUTB  and  lAfTTB 

are  obviously  equal  to  TAlJB  and  TAOB  correspondingly.  But,  A ,B,A|JB,AnBgi. 

Therefore,  TA  ,TB,TA|jB,TAnBCni.  The  same  is  easily  extended  to  LHA . and 

i ^ 

K 

1 im  UTA  . . 
n-«o  i=l 


We  want  to  point  out  here  that  if  Q is  not  countable  the  statement  of  the 
corollary  is  not  necessarily  true.  Also,  if  Q is  countable  so  is  TQ,  while  T is 
not  necessarily  a one-to-one  transformation. 

Now,  suppose  that  a measure  space  and  a closed  convex  family  y of 

transformations  T are  given,  such  that  for  every  Tgy,  TG  is  a a-algebra.  Then, 
each  Tgy  induces  a unique  measure  space  (Tn,TG,v^(|j)) . 

Let  two  different  measures  assigned  as  (Q.G)  and  let  T be  some  trans- 
formation from  the  family  ?.  Then,  two  measures  and  Vj,(42)  defined  on 

(Tn,TG)  are  induced  by  T,(j^,|j2.  If  some  distance  d (v^  , V^(|j2) ) convex  fl 

with  respect  to  T for  T^,  then  there  is  a unique  transformation  T„Gy  that  applied 
on  the  measure  spaces  (0,G,|_i^),  (n,G,H2)  induces  measures  (^2) 

realize  the  maxd(v„(^  )). 

Tgy  ^ ^ 


In  this  section  we  will  be  concerned  with  a particular  space  (QjG),  and 
particular  measures  transformation  T.  Specifically,  n will  be  the  e” 

Euclidean  space,  and  the  0-algebra- G will  include  all  sets  (-<*>, X^)  where  an  n-di- 
mensional  vector  defined  on  E*^.  The  family  C of  transformations  will  be  the  family 
of  row  vectors  of  dimensionality  n that,  satisfy  some  restriction-.  „ If  this  re- 
striction is  -C  RC'  Sa',  where  a some  constant,  and  R same  nxn  nomegative  matrix,  the 
family  of  transformations  is  convex.^  The- measures  '^iH  be.,,  in  general,  pro- 
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bability  measures,  and  more  specifically  Gaussian  probability  measures  defined  on 

(e"g).  - '• 


where 


1 

^ r-  1,..  x"  -1, 


f .('i  ) = (2tt)  " 1 R . 1 exp/"-  J(Y  -M  ,)R  .“^(Y  -M  \ ;i=l 

ni  n ' ni  ' 2 n ni  nr  n nyj 


Then,  for  T=C  we  obtain  from  (22),  (23): 
n 


Vj.(Ui)(^.Z)=J  f^^(w)dw;i=l,2 


where 


1=1.2 

I n ni  n ' J 

Finally,  let  the  convex  family  of  transformations  5 be  expressed  by  the 


, vW-C  M 


restriction 


C R ,C'  s A (26) 

n nl  n 

where  R , the  positive  definite  covariance  matrix  in  (23^  corresponding  to  i=l.  ~ 
nl 

o 

The  optimal  transformation  (if  any)  that  has  the  best  discriminant  effects 


will  be  sought,  where  this  effect  is  measured  by  the  Bhattacharrya 


, I-(Ji 


vergence, 


hivy,  Vasershtein,  Kolmogorov  and  variational  distances.  Each  of  the  distances  will 


be  examined  separately,  for  the  two  cases  of  M =M  R =R 

nl  n2  nl  n2 


a.  M ,=^  -=^ 
nl  n2  n 


Bhattacharyya  distance 


In  this  case,  the  Bhattacharyya  distance  in  (8)  becomes: 


r.  R r ' 


d 


(27) 


1 


Let  R where  W the  matrix  of  the  relative  eigenvectors  and  R „=WLW'; 

nl  nZ 

where  L the  diagonal  matrix  with  the  eigenvalues  X.  of  R w.r.t.  R . Then, 

J < nZ  nl 

the  optimal  transformation  C'that  maximizes^ d_ (f, f)  under  the  restriction 

n ^13  ii  /i  4. 

in  (26)  is  actually  a set  of  infinite  values  lying  on  a line  and  described  by; 


C°  = K max"  ^ +X  "^,X  t +X 1 
n V.  max  max  min  min  J 

where  K any  constant  that  does  not  exceed  absolutely  ]fk. 

max  Yx  ^ +X  ^,X  t ^ from  C that  realizes  the 

V max  max’  min  min  n n ’ n 


maximum  between  'X^  +X  ^ > X^.  +X 

max  max  min  min 


n ’ n 


C = [0,  . .010.  .o]w"^ 

” t 

position  corresponding  to  the  minimum 
L eigenvalue 


n " [0,..010..0]W  ^ 

position  corresponding  to  the  maximum 


eigenvalue  in  L 


The  proof  of  this  result  is  in  appendix  B. 


The  maximum  distance. .is  given  by  the  following  expression: 


Variational  distance 


*^V  (^1T’^2t) 


=4  U 2tn 


C R 
n n2  n 

"c  R ,c' 
n nl  n 

R ,c'^ 
n n2  n| 

C R ,c'i 
n nl  n 


C R _C 
n n2  n 

C R ,C' 
n nl  n 


C R „c'  ' 

n n2  n 

C R -C' 
n nl 

C R C'\^ 
n n2  n -I 

C R ,C'  i 

n nl  n / 


I 


I 


I i 
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r 

where  $(x)=| 


2n 


exp 


{-!) 


du 


as  shown  in  apperdi  ; B.  Again  the  maximum  distance  corresponds  to 


C R 1 

-r-B — FT-  = ' ,x ) 

v>  ^ 4 ^ . ma  X I mm 


n nl  n 


(33) 


where  X X . ^^e  maximum  and  minimum  eigenvalues  in  L correspondingly, 

max,  min  ■' 


The  corresponding  linear  "optimal"  transformations  are  again  given  by  (29)  and  (30) 
multiplied  by  any  constant  not  exceeding  a. 


The  maximum  value  of  the  distance  is 

2'£<nx 


d =4  max 
Vrmax 


2ln\^ 


max 


^maxf  2 
K 


max 


f 


III.  Kolmogorov  distance 

It  can  be  found  easily  that- 


C -R 

.n  n2  nl 

C R -C' 
n n2  n 

C R ,C'- 
n nl  -n  - 

C R ,cM 

/C  R .,c'\2 

n nl  n 

1 n n2  nj  - 

\C  R ,C'  ) 

\ - 

\ n nl  nl—  ^ 

- M 


C R „C 
2/  n n2  n 


C R ,C' 
n nl .« 


fC  R -C'\2 
n n2  nl 


1C  R .C" 

V n nl  nl 


X X 

The  same  transformations  C min,C  max  in  (29),  (30),  (28)  stand  and 


d = i d 

Kmax  4 Vrmax - 


where  d„  is  given  by  (34) 
Vrmax  j ^ ' 


(34) 


(35) 


(36) 


“1 
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I-divereence  distance 


It  can  easily  be  found  that: 


, , r C R C R ,C'  n 

Ale  e \ - n n2  n n nl  n 

~ 2^  L C R ,C'  C R J 
n ni  n - « » 


n n2  n 


It  is  obvious  that  in  (37)  the  same  symmetries  that  appear  in  (27)  exist. 

Therefore,  the  "optimal"  linear  transorma tion  is  the  same  with  the  one  in 
the  Bhattacharyya  distance.  The  maximum  distance  value  is  given  by: 

d,  = - T + 7 n>ax  M +\  \ . +\  J , 

Imax  2 H \ max  max  min  min  ' ( 


V . Levy  distance 

We  felt  the  L^vy  and  Vasershtein  distances  last  because  they  both  involve  a 
penalty  or  distortion  measure  and  therefore  consist  a separate  group. 

If  p(.,.)  is  the  distortion  measure  used;  denote  by  p ^(e)  the  maximum  y 
that  gives  p(x,x+y)=e  for  any  x.  Then,  the  L^vy  distance  for  the  equal  mean 
model  considered  here  is  given  by: 

n n2  n n n2  n n n2  n . n nl  n 

As  shown  in  appendix  B,  finding  the  Levy  distance  for  given  transforma- 
tions or  even  more  finding  the  "optimall'  discriminant  -transformation  in  this  case-.^ 
becomes  a task  that  can  be  only  approached  numerically.  This  is  a strong  indication 
of  the  fact  that  the  Prokhorov  and  L^vy  distances  are -characterized  by  the  disadvan- 
tage of  difficult  calculability . In  cases  that  the  inclusion  of  a penalty  or  dis- 
tortion measure  is  desirable,  the  Vasershtein  distance  is  a better  choice  as  we  will 
show  below. 


VI.  Vasershtein  distance 

We  will  work  with  specific  distortion  measures  here.  The  first  such  distortion 


measure  to  be  examined  will  be  the  popular  (x,y)=(x-y)' 
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For  the  equal  mean  case  we  are  considering  here,  it  can  be  shown  directly; 


jc  R ,C'+C  R „C'-2E  {y-C  M ) (x-C  M )]■ 
L n nl  n n n2  n . n n'^''  n n'J 


marginals 
IT ’ 2T  ® 


As  shown  in  [16]  the  Vasershtein  distance  d„  can  be  actually 

V,p^  IJ.  zi 

found  if  the  knowledge  of  some  structure  involving  the  intial  data  X ,Y  is  assumed, 

n n 

Specifically,  let  be  n samples  from  or  Gaussian,  wide  sense  stationary 

process  whose  nth  order  statistics  are  described  by  f (X  ) in  (23).  Then,* the  auto- 

nl  n 

correlations  matrix  R , is  a Toeplitz  matrix.  Let  Y be  from  a Gaussian  wide  sense 

nl  n 

stationary  process  also  whose  nth  order  statistics  are  described  by  f „(Y  ) in  (23) 

n2  n 

and  R - is  again  Toeplitz.  Then  the  scalar  variables  C X ,C  Y are  samples  from 
n2  n n n n 

Gaussian,  wide  sense  stationary  processes  also.  In  fact, -the  power  spectra  of  the 

variables  C X C Y exist  and  are  given  by  the  following  expressions  correspondingly, 
n n,  n n ’ ' . ; 


k=-» 


^2T  k=^  „ , _ 

where  P(X,)>  i=lj2  denotes  power  spectrum  under  corresponding  n order  statistics  in 

(23),  and  transformation.  Also,  {(^^n'^ni)  C^n+k"^n+k  i)  }’ 

expectation  is  taken  over  the  statistics  that  in  n order  ara  given  by  f^.  in  (23), 

and  what  has  been  denoted  by  R . till  now  is  actually  R . (o) . 

ni  ni 

Let  us  restrict  ourselves  to  cross-stationary  processes.  Then  the  Infimum 

in  (40)  will  be  taken  over  all  cross-stationary  statistics  with  *^8tf*a Is , 

and  the  cross  power  spectrum  exists  and  is  denoted  by  P(\)  . 

^1T’^2T 

As  explained  in  [16]  p.  324,  the  Vasershtein  distance  under  the  above  re- 
strictions is  given  by  the  following  expression: 


(43) 


dy.p  (fiT.f2T^  = ^2TT)""J  |P-  (X)-P^^;|  dX 

® -TT  ^IT  hi 

where  P^^(x)  are  given  by  (41)  and  (42).  We  want  to  point  out  here  that  for  the 

existence  of  P._(X)  it  is  sufficient  that  the  vectors  X Y form  cross  stationary 
lT  n n •' 

processes  in  time.  In  this  case,  the  matrices  R(p)>R(P)  do  not  h^ive  to, be  Toeplitz. 

nl'  n2 

Let  X^(j)>  denote  n-dimensional  data,  collected  at  time  j from  popula- 
tions distributed  as  in  ‘=°’^’^®spondingly . Let  both  X^(j),Y^(j);  j=0,l,... 

be  first  order  Markov.  Then,  the  spectra  in  (41)  and  (42)  become; 


(X)=C  R , 
^ n nl 


(o)c'+2C  R , (l)c'-cosx 

n n nl  n 


(44) 


» (X)=C  R„,(o)C'+2C  R (l)C' -cosx 


(45) 


If  expressions  (44)  and  (45)  are  substituted  in  (43)  the  following  expression  is 
obta ined : 


‘^V,p  ^^1T’^2T^  " 


Calculating  the  expression  in  (46)  analytically  to  study  extremes  with  respect  to  the 
linear  transformation  is  not  possible.  For  that  reason  we  are  using  bounds.  In- 
deed, applying  the  Schwartz  inequality  on  the  integral  in  (46)  we  obtain: 

2 


‘^V,Pg^^lT’^2T^^ 


Uc  R , (o)c'  - /c  R -(o)C^ 
u n nl  ' n I n n2  nj 


(47) 


with  equality  if  and  only  if  there  is  some  constant  B such  that: 

([’'„l«»-»''„2<‘'>]  +2<=°«x[K„i(l)-BR„2a|c'  - 0 
for,  almost  all  X €[-n,n]. 


(48) 


(46) 
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To  maximize  d„  with  respect  to  the  C choice  we  will  maximize 

V,D  li  n 

the  lower  bound  in  (47)  instead.  Since  this  bound  can  be  written: 


,fcR,(o)C' 

I n nl'  n J 

CnRn2(o)C' 

and  due  to  the  fact  that,  as  shown  in  appendix  B,  the  ratio  can  only 

C R - (O JC 
n nl  n 

vary  in  with  changing  (where  the  eigenvalues  in  L),  the  bound  in 

(49)  can  increase  to  infinity  for  unrestricted  transformations.  However,  if  (26) 
is  true  the  bound  in  (49)  is  maximized  for: 


c = ifl  [0. . .010.  .ojw"’- 

n • 


where  the  value  1 in  [0...010..0]  corresponds  to  this  position  in  L that  belongs 

to  max  (X  ,X  ^min) . The  value  of  this  maximum  bound  is  given  by: 
max  o j 


2 

B(p  )=^r^"  ^min)l 

S L iTla  X J 


We  want  to  emphasize  here  that  the  Vasershtein  distance  as  well  as  the  bound 
in  (49)  depend  only  on  the  second  order  statistics  of  the  data  X^(j),Y^(j)  and  they 
are  totally  insensitive  to  the  exact  underline  distribution.  This  property  is  valu- 
able in  the  case  of  ill-defined  environments. 

2 

If  the  distortion  measure  is  (x,y)=| x-y | instead  of  (x-y)  , the  Vasershtein 

distance  d (f^  ,f  ) is  bounded  from  below  ([16],  th,  5)  by 
V li  zi 

B(p.)  = (2tt)^[C  R (O)C'-C  R (O)c'j  (52) 

•C  n nl  n n nl  n 

The  bound. in  (52)  is  obviously  maximized  for  the  transformation  in  (50),  where 
the  restriction  (26)  is  again  true.  This  maximum  is  given  by  the  following  express- 


ion : 


B (p,)=  2n  ^Afl-inaxfx  >X  ^ • sl 

L V'max’'^  min^J 


VII.  General  Observations 


It  is  evident  from  the  preceding  analysis  that  for  data  distributed 


as  described  by  f , (X  ),f  „ (Y  ) in  (23)  correspondingly  with  M ,=M  and  restricted 
nl  n n/  n nl  nz 


linear  transformations  C , where  the  restriction  is  described  by  (26)  the  Bhattac- 

n 


haryya,  I-divergence , Kolomogorov  and  variational  distances  are  all  maximized  by 


the  same  transformation 


C =K[0. . .010. . .0]W 
n 


where  R ,=WW  , R „=WLW,  L={X . ) and  the  digit  1 in  (54)  corresponds  to  the  position 
nl  nZ  1 


of  max  amplitude  K is  such  that:  K€[-  ik,ifk  ] 


The  Vasershtein  distance  with  the  implication  of  X^(j),Y^(j);  j=0,l...  being 


n dimensional  stationary,  first  order  Markov  processes  and  with  underline  distortion 


measures  either  p (x,y)=(x-y)^  or  p. (x,y)=| x-y | , does  not  require  the  specific 

S 'V 


statistics  described  by  (23).  In  addition, a certain  lower  bound  on  this  distance 


is  maximized  again  by  the  transformations  in  (54),  where  the  amplitude  K is 


equal  to 


b,  R ,=R  =R  ; M = M -M 
nl  n2  n n n2  nl 


I .Bhattacharyya  distance 


The  Bhattacharyya  distance  is  given  in  this  case  by  the  following  expression: 


, (C  M ) 

V^T- V 5 rrp- 

n n n 


Our  objective  is  to  maximize  the  ratio  in  (55)  with  respect  to  linear  transfo- 


rmations C that  satisfy  the  restriction  in  (26). 
n 


It  is  straight  forward  to  obtain  the  following  expression  for  the  "optimal"  C : 

n 


22 


c =km^r"’ 

n n n 


K€[-/A,/A] 


The  maximum  value  of  the  distance  is; 

1 ' -1 
d =-  M R M 
B 8 n n n 


I .Var iat ional  distance 

It  is  easy  to  find  that  here; 

^^2  /C  R C'"^"  2/C  R (T  ^ 

n n n n n n 

where,  2 

$(x)=f^  — — exp{-^}du 

-®  /ZTT  Z 

The  distance  in  (58)  is  maximized  for  this  C that  realizes  the  maximum 

n 

C M 

n n which  is  the  one  in  (55)  again. 

2/C  R”C^ 
n n n 

The  maximum  distance  value  is  given  by; 


d,  =2[$(2“’/i'r''m  ) - $(-2"'/i'r'’m  )] 
V rmax  nnn  nnn 


I . Kolmogorov  distance 


Then,  , C M CM 

d (f  f )=  ^?===7-)  -$(—==—,) 

IT’  2T^  P2/C  R C'’^  '•2/C  R C^  ^ 

' nnn  nnn 

d =2"'  d 

Kmax  V rmax 

where  d,,  is  given  by  (59)  and  the  "optimal"  C is  again  the  one  in  (56), 
V rmax  ^ > n 

IV  . I -divergence  distance 


It  can  be  easily  foun^  again  that  in  the  present  case; 


'^/^1T’^2T^“^  C R C' 
nnn 


and  of  course  the  maximum  is  also  obtained  by  the  C in  (56)  and 

n 


d,  = i r"V 

I max  n n n 


V.  Levy  Distance 

As  we  will  see,  in  this  case  of  equal  convariances  the  "optical" 
transformation  in  the  Levy  distance  sense  can  be  found.  It  is  shown  in 
appendix  B that  in  the  present  case: 


d (f  f ) = inf  {e-§  (-P  -v^/  N $ C_  -H  \ p 1 

L.p^  IT’  2T^  ''  2(cTT'  ^ ^ 2{TrT'^  - ^ 

1 n n n | n n n (64) 

where  p Ve)  = max  (y:(x,x  + y)  = ej 

For  momentarily  fixed  power  C R , the  transformation  that  obtains 

n n n 

maximum  d in  (64)  is  the  one  that  maximizes  C M . That  is  because  for  C M 
L,p  n n n n 

value  equal  to  x^  < it  is  obviously  true  that  every  e candidate  for 

d,  (f,_,f_T.)  with  C M = X instead. 

L,p  IT  2T  n n 2 

So,  letting  now  C R C'  vary  in  [o,A] , we  obtain  the  maximum  Levy  distance 

n n n 

in  (64)  for  C as  in  (56)  with  K amplitude  fixed  and  equal  to  /A  . 
n 

VI.  Vasershtein  Distance 


Let  again  (as  in  3^|)  ^ ~ stationary 

n-dimensional  processes  that  are  also  first  order  Markov.  Let  X (j) 

n 

come  from  a population  with  M , and  R (o)  , R (1)  and  let  Y (j)  come 

n I n n n 

from  another  population  with  M „ and  same  R (o),  R (1).  The  spectra 

n2  n n 

I (43)  are  then  equal  and  the  Vasershtein  distance 

IT  IT 

for  cross-stationary  X (j)  Y (j)  is  realized  for  joint  spectrum 

n n J r 


is  realized  for  joint  spectrum 


and  it  is  equal  to: 


P.  f(^>=  [ Pf  (X)  P.  (X)  ]■ 
2T  ^2T 


d,,  (f_,  f p = (C  M )' 

Vp  IT  2T  n n 

s 


(65) 


where  p^(x,y)  = (x  - y) 


The  distance  in  (65  ) is  obviously  maximized  for  C given  by  (56)  with 


fixed  amplitude  K = /A  . 

The  maximum  value  of  the  distance  is: 
d. 


= A(M  R ~’m  )^ 
Vp  max  n n n 

s 


(66) 


(67) 


If  the  distortion  measure  is  p.(x,y)  = | x - y j , then 

E {p,(x,y)}  = E (|(x-C  M ,)-(y-C  M ^)-C  M |] 

-T  ' n n I n n2  n n 

where  the  expectation  is  over  all  joint  statistics  with  f^^.f^^  marginals. 

For  this  case  of  equal  spectra,  and  due  to  the  lower  bound  given  by  theorem  5 

in  [16]  we  have  from  (67); 

E - E (|(x-C^«„,)  - (y-C„H„2)|)  > 

- - 9^  (')  I “ 


-n  IT  2T 


C M I 
n n ’ 


♦Foot  Note:  For  the  structure  considered  here  and  in  general  R (k)  ^ R (k) 

n 1 n2 


k - 0,1,...  ^ the  Vasershtein  distance  is  given  by  the  expression; 

■IT.-2T-  f I PfV)  - ^ 

s -TT 


"vp  ^^T'^2T^  = ri  Pf  (X)  - P^  (X)  I d -h  (C  M )' 

V,  - IT  2T  Ann 
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and  the  rate  distortion  theory,  have  been  evaluated  and  used  for  the  linear 

reduction  of  Gaussian  data  to  one  scalar  parameter.  It  was  found  that 

while  the  Bhattachayya,  I -di vergence,  vatiational,  kalmogorov  and  Levy 

distances  are  over-sensi tive  to  the  underline  statistics,  the  Vasershtein 

distance  depends  only  on  second  order  moments. 

Also,  while  the  Levy  distance  is  hard  to  calculate  analytically 

even  in  the  simple  case  of  Gaussian  data,  simple  lower  bound  on  the  Vasershtein 

2 

distance  can  be  found  for  the  distortion  measures  p^(x,y)  = (x-y) 
p^(x,y)  = |x-y|  . 

Finally,  for  the  Gaussian  data  and  the  linear  reduction  mentioned  above, 
it  was  found  that  all  distances  (whenever  the  result  analytically  feasible) 
give  the  same  "optimal"  transformation  with  the  most  highly  class-discrimination 
properties.  This  is  true  in  both  equal  mean  and  equal  covariances  cases. 
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Appendix  A 

For  the  proof  of  inequality  (3)  in  section  2,  the  following  theorem 
by  Strassen  ([7],  Th.  11)  is  needed. 

Theorem  I 

Given  two  measures  U'j  def i ned  on  the  separable  complete  space 
(C^.Cl)  , the  inequality 

inf  {e:u^(A)  < li^dJBjBea,  p(A,B)  < e)  + c ; VAea)  < C (a^) 

is  true  if  and  only  if  there  is  some  joint  measure  r(A,B)  with  |j,j  (A) 
marginals  such  that 

r(A,B:  p(A,B)  > Q)  < Q (a^) 

Using  Theorem  I we  will  prove  the  following  lemna. 


Lemna  I 

p (M'piJ.2)  « p^*^l ’^2^  Prokhorov  and  Vasershtein  distances 

correspondingly  as  defined  by  (1)  and  (2),  then  for  every  p(-,-)  and  measures 
.P-2  defined  on  (0,0)  and  such  that  g,j(Q)  = fo.l  lowing  jnequal  ity 

is  true: 

^ (Pj) 

Proof: 

dp  (e:g,^(A)  < g,2(UB;Bea  , p(A,'S)  < e)  + c;  VAeQ)  = 

For  6 > o , there  is  some  €.  : e < e.<  c +6  and  there  is  some  A giving: 

6p6-p  o^^ 

p.(A^)  > n (UB;B  : Bertp  (A,B)  < e.)  + Cc  . 


By  theorem  1,  there  is  then  no  joint  measure  r(*,-)  such  that 
r(A,B:  p(A,B)  > ej  < . 

That  is,  for  every  r(*,*)  choice.it  is  true  that 
r(A,B:  p(A,B)  > 0^)  < 0^  . 

Since  Efo(A,B))  = {y<A,B)j  + E^^  (p^A.B)}  > 

A,B:o(A,B)>a  A,B:p(A,B)<a 

> a r(A,B:  p(A,B)  > a)  ; V a 

we  can  obtain  that  for  every  r(*,*)  choice  it  is  tjue  that 

^r(.,.)^  P(A,B)}  > 0g  r(A,B:  p(A,B)  >6^  . 

Therefore,  if  by  0^  we  denote  the  Vasershtein  distance,  we  have: 

0 = inf  E , fo{A,B)}  > 0,^  > 0^  . 

V r(*,*)  - 6 p 

every  r(*,‘)  withg,^,^,^  marginals. 

Proof  of  Lemna  2 

S A B 

If  P(A,B)=(  ’ A-D  » Then  the  inequality; 

O , A — B 

P-/A)  < ti^(UB  ; Bea  ,p(A,B)  < 0 ) + 0 
is  equivalent  to 


p.j  (A)  < + 0 ; 0 < s 


p,j(A)  < n^CUB  ; BeQ)  + e = + c ; 0 > S 

If  = pi2(n)  > max  (pi^(A),  V Aea 

then  the  second  part  of  (a^)  is  always  true  (for  all  A0Q)  because 


p.j(A)  < ^.^(UB  ; B0Q)  = p_2(n)  ; VAcQ 


Thus  t the 


inf  {e:u^(A)  < H2<UB;Bea,p(A,B)  < e)  + e)  ; VAeQ) 


is  equal  to  S if  for  every  e < S there  is  some  AcCl  such  that  either 
Uj(A)  > g-^CA)  + e or  g,2(A)  > g,j(A)  + e . 

That  is  because  then: 

sup  I Uj(A)  - I ^ ® » V e < S , which  leads  to: 

sup  I g.^(A)  - 112(A)  I > S . But  since  o < Uj(A)  < S ; VAeO  ; 

AcQ 

i = 1,2  , the  supremun  above  can  not  exceed  S and  it  can  only  be  equal 
to  it. 

On  the  other  hand,  if  for  some  c < S , the  inequalities  Hj(A)  < ^.^CA)  + e 

are  true  V AcQ  , then  the  infimum  in  (a^)  becomes  equal  to: 

inf  (e:e  < S , u^(A)  < g,2(A)  + e , 112(A)  < Uj(A)  + e ; V AeO  } 

= inf  (e  ; e < S , I u^(A)  - 112(A)  | < e ; V AcQ) 

= sup  I u (A)  - n (A)  I = d (n  ,g  ) 

AcO  ^ K I 2 


The  proof  is  now  complete. 


Proof  of  Lemna  3 


We  have  in  this  case: 


dy  f(A,B;  A,B  e a,  A ^ B) 

r(.,*)  with  »U2  marginals 

= inf  r(A,B;  A,B  e Q , p(A,B)  = 1) 
r(-,-)w.m.  1a,.U2 

= inf  r(A,B;  A,B  e Q,  p(A,B)  > 1) 

r( • , •) 

Let  - e.  < I . 


Then,  we  obtain  from  (a  ); 

6 

inf  r(A,B  ; A.Bea  , p(A,B)  > 1)  = e 
r( • , •) 

So,  for  every  6>  o there  is  some  r(*,*)  with  .U.2  marginals  such  that 
r(A,B;  A,BeQ  , p(A,B)  > 1 ) < + 6 


From  theorem  I we  obtain  then: 

U,(A)  < u (UB;Bea  , p(A,B)  <e  +6)  + e + 6;  VAeG  (a  ) . 

I I — V V 7 

Expression  (e^)  is  true  only  for  6 > o . 

Therefore, 


inf  (e:g,  (A)  < p (UB;Bea  , p(A,B)  < e)  + e ; VAeG}  = c (a  ) 
I i ’ - V 8 

But,  as  shown  fn  the  proof  of  Jemna  2, 

inf  (e:p^(A)  < p2<UB;Bea  , o(A,B)  < e)  + e ; VAeG) 

tl  fA)  - II.  <’A1  I = 


- sup  ( u^(A)  - I = d^(p, ,p^) 


AeQ 


(a,) 


From  (ag)  and  (a^)  we  conclude  that  for  the  measure  p(A,B)  as  expressed 
by  lemna  3,  it  is  true  that: 


''v.0<“l’'‘2'  ■ V“r'‘2> 


Appendix  B 


Proof  of  result  in  4al 

t R .C' 
n n2  n 

Denote  x = t-  • Then, 

C R ,C 
n nl  n 


X ^ = g(x) 


^ t r\ 

monotonical ly  decreasing  from  x < 1 to  x = 1 and  monotonical ly  increasing 


('^n'^nl^n'  R,^  (fj 


(bp 


from  X “ 1 to  x > 1 . 


Therefore,  if  there  are  any  restrictions  on  the  value 


32 


of  X , g(x)  will  assume  its  maximum  for  either  the  minimum  or  the 


maximum  x allowed.  Now,  if  we  define  D = {d.  ; i = l,*",n)  = C W , 

I n 

I 

where  R , = WW  , we  obtain: 
nl 

C R C'  n ^2 

x"’  = JLPln__  = V , \ _ 

r R r'  1-1  I rc~  .2 


C R ,C' 
n nl  n 


where  L = (X  .3  ; Rp2  = WLW^ 

.2 


J=>  J 


n d. 


For  every  i , o < i < 1 , X.  > o and  S J_ 


E d . 
j = l J 


j-i  J 


Therefore,  the  maximum  value  x can  take  is  equal  to  , which  is 

the  maximum  eigenvalue  in  L , and  this  is  realized  by  D = [o, • • -o, k,o* • *o]  = C W 

t (63) 

, , Position  corresponding 

where  k any  constant. 

to  X 

max 

From  (b^)  we  obtain 

cVax=  k [o,  "0, 1 ,0.0]  W"^  (5  ) 

4 ^ 

position  of  X. 

max 

The  minimum  value  x can  take  is  Xmin  and  is  similarly  leading  to: 

C^mirt  k [o,-*o,l,o,',o]  w"’  (b  ) 

4 5 

pos i t i on  of  X , 
min 

Applying  the  restriction  C^R^,  ^ n ^ A to  (b  ) and  (b  ) we  obtain:  |k|  < /A  . 


Proof  for  4al 


Denote  x 


C R ^C' 
n n2  n 

C R , C'~ 
n nl  n 
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For  X > 1 


2-lnx  is  monotonical  ly  decreasing  from  <=  to  zero  with  x increasing 


from  X = 1 to  infinity. 

Also,  (x-l)J  increases  monotoni cal  ly  for  x increasing  from  x = 1 

I x^-1 

to  infinity. 


Therefore,  f | x j is  increasing  monoton i cal  1 y with 

X increasing  from  x = I to  infinity.  Similarly,  ^ x 

is  monotonically  increasing  with  x decreasing  from  x = 1 to  . 


Proof  for  4aIT: 

C R 

Denote  x = — J “ ^n  *^nl  ^n 

C R , C 
n nl  n 

and  let  us  cons i der -that  x<  1 . For  x>  1 symetric  procedure  holds. 

Then,  the  Levy  distance  in  (40)  becomes: 

d = inf  [t:Hh  - $ (y  + !— ill ) < e ; Vy)  (b  ) 

'P  J.. 

if  the  inequality  in  (b^)  should  be  true  for  every  y and  for  given  e,x, 

it  is  sufficient  that  it  is  true  for  this  y that  obtains  the  g(y)  = $(1)  _ 
-1 

' / \ 

- f(y  + P maximum.  Taking  the  first  derivative  of  g(y)  we  find 

-I  u^ 

g,  (y)  = i cp(Ji)  . cp  (y  + B_lii  ) . ^hgre  cp(u)  = -P^~^  ^ 


The  derivative  g’’(y)  is  nomegative  for  these  y's  that  satisfy: 
^ i ^ (y  + 

^ ^ Oj 
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Or  for; 


x2  a, 


< y < 


^ -1 
X 


^Uh  2(^  -1)  tn  i 

o,  il  a. 


"1^-'  2 


1 


-1 


Due  to  this  and  the  fact  that  g(-»)  = o , we  see  that  g(g)  obtains 

y = 1 |_J< I X 

\ -I 

X 

^nd  for  X < 1 the  distance  in  (b  ) becomes: 

6 


, ■"Y  ^ > + 2(^  -1)  In^ 

■ '"f 


JL  P * (e)  + 


«1  ,-P  Ve)  ,2 

p-<  -Of^  ) + 


2<TaUnl 


) ^ 


-1 

X 


<1.,) 


If  this  X < 1 that  makes  the  distance  in  (bg)  maximum  can  be  found, 
the  transformation  that  obtains  this  x can  be  found  aiso  as  in  the 

cases  of  Bhattacharyya, i -di vergence,  variational  and  Kolmogorov  distances. 
However,  the  task  proves  to  be  such  that  only  number ica I methods  can  approach  it. 
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Proof  for  4bT 


For  the  equal  covariance  case  in  4b,  the  Levy  distance  is  given  by: 

1 CM 

+ r — ) < c 


L,p  IT’  2T 


CR.C  CRC"^- 
n n n n n n 


f(x  + 


- $(x  + 


C R „ C ' ' C R „ C' 

n n n n n n 


) < e ; Vx  } 


Denote: 


_ I CM 

g (X)  = $(X)  - $(X  + g ) 

Ti^n^n  ^n’^n^n 


(b„) 


gjCx)  - Hx  + — - -^)  - Kx  + ) 

^nnri  CnKn^n 


;cx)  - ,(x) . ^(x  + e1^  ^ , 

n n n n n n 


a 92 (x)  , 


= 92<^^  = cp(x  + ~ — ^ ^ ) -<?  (x  + 


n n n 


C R C 
n n n 


<^4> 


9j(x)  is  positive  for: 


-1  CM  _1  CM 

X 0 (e)  n n o '(g)  n n 

(2x  + + •/  ) > o 


7-  + y ) + y 

CRC  CRC  CRC  CRC 

nnn  nnn  nnn  nnn 


<^5> 


g^Cx)  is  positive  for: 


(2x  + ^ -n-n  ^ ) (a  -iel,..,  . P-  .--  > > o 

CRC  C R C ''C  R C e.  R c' 
nnn  nnn  nnn  nnn 


Let  e be  a candidate  for  d (f._,f„_)  in  (b,.) 
I Il,p  iT  iCT  1 0 


I’C'-'.r* 
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Then, 

Ijl  If  p'Ve,)  > I I 

/ / 

both  9j(x),  g^Cx)  are  positive  for  x>  - 


p (e.)  + 

I n n 

2 C R C 
n n n 


That  means  that  both  9j(x),  g^Cx)  obtain  maximum  at  either  x = +<»  or 
X = -00  and  this  value  is  zero.  So  this  case  is  trivial. 

' i • If  P (Sj)  < ; p Vgj)  > o 

- tn 

g,(x)  IS  positive  for  x > - t 

' 2 C„  R„  C„ 

n ^ ^ 

and  g^Cx)  is  positive  for  : x < - P ~^  ^n^n 


2 C R C 
n n n 

That  means  that  gj(x)  obtains  maximum  for  x = + =>  and  this  maximum  is 
zero  (trivial),  while  g„(x)  obtains  maximum  for  x = -^ 

2 r-"  ■ ■ 

2 C R C 
n n n 

and  this  maximum  is  equal  to: 


P \e,) 


- I ..  ) - $ ' 

C - 2C  R.  C -- 


n -n  n 


and  (b  -)  reduces  the  search  for  d (f,_,f  to  finding  the  infinuim 

such  that  the  expression  in  does  not  exceed  . 

i i i . The  case  C M <on”*(e)<-C  M 
n n K 1 n n 


is  synmetric  to  i i . 
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